GIST-IT: Combining Linguistic and Machine Learning Techniques for Email Summarization
نویسندگان
چکیده
We present a system for the automatic extraction of salient information from email messages, thus providing the gist of their meaning. Dealing with email raises several challenges that we address in this paper: heterogeneous data in terms of length and topic. Our method combines shallow linguistic processing with machine learning to extract phrasal units that are representative of email content. The GIST-IT application is fully implemented and embedded in an active mailbox platform. Evaluation was performed over three machine learning paradigms.
منابع مشابه
Combining linguistic and machine learning techniques for email summarization
This paper shows that linguistic techniques along with machine learning can extract high quality noun phrases for the purpose of providing the gist or summary of email messages. We describe a set of comparative experiments using several machine learning algorithms for the task of salient noun phrase extraction. Three main conclusions can be drawn from this study: (i) the modifiers of a noun phr...
متن کاملExtractive Automatic Summarization: Does more Linguistic Knowledge Make a Difference?
In this article we address the usefulness of linguistic-independent methods in extractive Automatic Summarization, arguing that linguistic knowledge is not only useful, but may be necessary to improve the informativeness of automatic extracts. An assessment of four diverse AS methods on Brazilian Portuguese texts is presented to support our claim. One of them is Mihalcea’s TextRank; other two a...
متن کاملA Publicly Available Annotated Corpus for Supervised Email Summarization
Annotated email corpora are necessary for evaluation and training of machine learning summarization techniques. The scarcity of corpora has been a limiting factor for research in this field. We describe our process of creating a new annotated email thread corpus that will be made publicly available. We present the trade-offs of the different annotation methods that could be used.
متن کاملCombining Different Summarization Techniques for Legal Text
Summarization, like other natural language processing tasks, is tackled with a range of different techniques particularly machine learning approaches, where human intuition goes into attribute selection and the choice and tuning of the learning algorithm. Such techniques tend to apply differently in different contexts, so in this paper we describe a hybrid approach in which a number of differen...
متن کاملIdentifying relevant phrases to summarize decisions in spoken meetings
We address the problem of identifying words and phrases that accurately capture, or contribute to, the semantic gist of decisions made in multi-party human-human meetings. We first describe our approach to modelling decision discussions in spoken meetings and then compare two approaches to extracting information from these discussions. The first one uses an opendomain semantic parser that ident...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001